Improving Name Discrimination: A Language Salad Approach

نویسندگان

  • Ted Pedersen
  • Anagha Kulkarni
  • Roxana Angheluta
  • Zornitsa Kozareva
  • Thamar Solorio
چکیده

This paper describes a method of discriminating ambiguous names that relies upon features found in corpora of a more abundant language. In particular, we discriminate ambiguous names in Bulgarian, Romanian, and Spanish corpora using information derived from much larger quantities of English data. We also mix together occurrences of the ambiguous name found in English with the occurrences of the name in the language in which we are trying to discriminate. We refer to this as a language salad, and find that it often results in even better performance than when only using English or the language itself as the source of information for discrimination.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptation of TAAABLE to the CCC'2017 Mixology and Salad Challenges, Adaptation of the Cocktail Names

This paper presents the submission of the TAAABLE team to the 2017 Computer Cooking Contest. All challenges except the sandwich challenge are addressed. Online systems have been developed for the salad and mixology challenges by adapting previous successful CCC TAAABLE systems to the requirements of the 2017 challenges. However, this paper presents two main contributions. The first contribution...

متن کامل

Modified Goal Programming Approach for Improving the Discrimination Power and Weights Dispersion

Data envelopment analysis (DEA) is a technique based on linear programming (LP) to measure the relative efficiency of homogeneous units by considering inputs and outputs. The lack of discrimination among efficient decision making units (DMUs) and unrealistic input-outputs weights have been known as the drawback of DEA. In this paper the new scheme based on a goal programming data envelopment an...

متن کامل

A Language Independent Approach for Name Categorization and Discrimination

We present a language independent approach for fine-grained categorization and discrimination of names on the basis of text semantic similarity information. The experiments are conducted for languages from the Romance (Spanish) and Slavonic (Bulgarian) language groups. Despite the fact that these languages have specific characteristics as word-order and grammar, the obtained results are encoura...

متن کامل

Efficiency Analysis Based on Separating Hyperplanes for Improving Discrimination among DMUs

Data envelopment analysis (DEA) is a non-parametric method for evaluating the relative technical efficiency for each member of a set of peer decision making units (DMUs) with multiple inputs and multiple outputs. The original DEA models use positive input and output variables that are measured on a ratio scale, but these models do not apply to the variables in which interval scale data can appe...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006